68 research outputs found
CARBON: A Counterfactual Reasoning based Framework for Neural Code Comprehension Debiasing
Previous studies have demonstrated that code intelligence models are
sensitive to program transformation among which identifier renaming is
particularly easy to apply and effective. By simply renaming one identifier in
source code, the models would output completely different results. The prior
research generally mitigates the problem by generating more training samples.
Such an approach is less than ideal since its effectiveness depends on the
quantity and quality of the generated samples. Different from these studies, we
are devoted to adjusting models for explicitly distinguishing the influence of
identifier names on the results, called naming bias in this paper, and thereby
making the models robust to identifier renaming. Specifically, we formulate the
naming bias with a structural causal model (SCM), and propose a counterfactual
reasoning based framework named CARBON for eliminating the naming bias in
neural code comprehension. CARBON explicitly captures the naming bias through
multi-task learning in the training stage, and reduces the bias by
counterfactual inference in the inference stage. We evaluate CARBON on three
neural code comprehension tasks, including function naming, defect detection
and code classification. Experiment results show that CARBON achieves
relatively better performance (e.g., +0.5% on the function naming task at F1
score) than the baseline models on the original benchmark datasets, and
significantly improvement (e.g., +37.9% on the function naming task at F1
score) on the datasets with identifiers renamed. The proposed framework
provides a causal view for improving the robustness of code intelligence
models
Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors
Although the dynamic type system of Python facilitates the developers in
writing Python programs, it also brings type errors at run-time. There exist
rule-based approaches for automatically repairing Python type errors. The
approaches can generate accurate patches but they require domain experts to
design patch synthesis rules and suffer from low template coverage of
real-world type errors. Learning-based approaches alleviate the manual efforts
in designing patch synthesis rules. Among the learning-based approaches, the
prompt-based approach which leverages the knowledge base of code pre-trained
models via pre-defined prompts, obtains state-of-the-art performance in general
program repair tasks. However, such prompts are manually defined and do not
involve any specific clues for repairing Python type errors, resulting in
limited effectiveness. How to automatically improve prompts with the domain
knowledge for type error repair is challenging yet under-explored. In this
paper, we present TypeFix, a novel prompt-based approach with fix templates
incorporated for repairing Python type errors. TypeFix first mines generalized
fix templates via a novel hierarchical clustering algorithm. The identified fix
templates indicate the common edit patterns and contexts of existing type error
fixes. TypeFix then generates code prompts for code pre-trained models by
employing the generalized fix templates as domain knowledge, in which the masks
are adaptively located for each type error instead of being pre-determined.
Experiments on two benchmarks, including BugsInPy and TypeBugs, show that
TypeFix successfully repairs 26 and 55 type errors, outperforming the best
baseline approach by 9 and 14, respectively. Besides, the proposed fix template
mining approach can cover 75% of developers' patches in both benchmarks,
increasing the best rule-based approach PyTER by more than 30%.Comment: This paper has been accepted by ICSE'2
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit
Adapting Deep Learning (DL) techniques to automate non-trivial coding
activities, such as code documentation and defect detection, has been
intensively studied recently. Learning to predict code changes is one of the
popular and essential investigations. Prior studies have shown that DL
techniques such as Neural Machine Translation (NMT) can benefit meaningful code
changes, including bug fixing and code refactoring. However, NMT models may
encounter bottleneck when modeling long sequences, thus are limited in
accurately predicting code changes. In this work, we design a Transformer-based
approach, considering that Transformer has proven effective in capturing
long-term dependencies. Specifically, we propose a novel model named DTrans.
For better incorporating the local structure of code, i.e., statement-level
information in this paper, DTrans is designed with dynamically relative
position encoding in the multi-head attention of Transformer. Experiments on
benchmark datasets demonstrate that DTrans can more accurately generate patches
than the state-of-the-art methods, increasing the performance by at least
5.45\%-46.57\% in terms of the exact match metric on different datasets.
Moreover, DTrans can locate the lines to change with 1.75\%-24.21\% higher
accuracy than the existing methods
Generative Type Inference for Python
Python is a popular dynamic programming language, evidenced by its ranking as
the second most commonly used language on GitHub. However, its dynamic type
system can lead to potential type errors, leading researchers to explore
automatic type inference approaches for Python programs. The rule-based type
inference approaches can ensure the accuracy of predicted variable types, but
they suffer from low coverage problems. Supervised type inference approaches,
while feature-agnostic, require large, high-quality annotated datasets and are
limited to pre-defined types. As zero-shot approaches, the cloze-style
approaches reformulate the type inference problem into a fill-in-the-blank
problem. However, their performance is limited.
This paper introduces TypeGen, a few-shot generative type inference approach
that incorporates static domain knowledge from static analysis. TypeGen creates
chain-of-thought (COT) prompts by translating the type inference steps of
static analysis into prompts based on the type dependency graphs (TDGs),
enabling language models to learn from how static analysis infers types. By
combining COT prompts with code slices and type hints, TypeGen constructs
example prompts from human annotations. TypeGen only requires very few
annotated examples to teach language models to generate similar COT prompts via
in-context learning. Moreover, TypeGen enhances the interpretability of results
through the use of the input-explanation-output strategy. Experiments show that
TypeGen outperforms the best baseline Type4Py by 10.0% for argument type
prediction and 22.5% in return value type prediction in terms of top-1 Exact
Match by using only five examples. Furthermore, TypeGen achieves substantial
improvements of 27% to 84% compared to the zero-shot performance of large
language models with parameter sizes ranging from 1.3B to 175B in terms of
top-1 Exact Match.Comment: This paper has been accepted by ASE'2
- …